Dense and Sparse Matrix Operations on the Cell Processor

نویسندگان

Samuel W. Williams

John Shalf

Leonid Oliker

Parry Husbands

Katherine Yelick

Samuel Williams

چکیده

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI’s forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycleaccurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, using a variety of algorithmic approaches. Results demonstrate Cell’s potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cache Oblivious Dense and Sparse Matrix Multiplication Based on Peano Curves

Cache oblivious algorithms are designed to benefit from any existing cache hierarchy—regardless of cache size or architecture. In matrix computations, cache oblivious approaches are usually obtained from block-recursive approaches. In this article, we extend an existing cache oblivious approach for matrix operations, which is based on Peano space-filling curves, for multiplication of sparse and...

متن کامل

Matrix Bidiagonalization on the Trident Processor

This paper discusses the implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine _GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor....

متن کامل

Algorithmic patterns for H-matrices on many-core processors

In this work, we consider the reformulation of hierarchical (H) matrix algorithms for many-core processors with a model implementation on graphics processing units (GPUs). H matrices approximate specific dense matrices, e.g., from discretized integral equations or kernel ridge regression, leading to log-linear time complexity in dense matrix-vector products. The parallelization of H matrix oper...

متن کامل

Chain Multiplication of Dense Matrices: Proposing a Shared Memory based Parallel Algorithm

Chain multiplication of matrices is widely used for scientific computing. It becomes more challenging when there is large number of floating point dense matrices. Because, floating point operations take more time than integer operations. It would be interesting to lower the time of such chain operations. Now-a-days every multicore processor system has built in parallel computational power. This...

متن کامل

Dense matrix operations on a torus and a boolean cube

Algorithms for matrix multiplication and for Gauss-Jordan and Gaussian elimination on dense matrices on a torus and a boolean cube are presented and analyzed with respect to communication and arithmetic complexity. The number of elements of the matrices is assumed to be larger than the number of nodes in the processing system. The algorithms for matrix multiplication, triangulation, and forward...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Dense and Sparse Matrix Operations on the Cell Processor

نویسندگان

چکیده

منابع مشابه

Cache Oblivious Dense and Sparse Matrix Multiplication Based on Peano Curves

Matrix Bidiagonalization on the Trident Processor

Algorithmic patterns for H-matrices on many-core processors

Chain Multiplication of Dense Matrices: Proposing a Shared Memory based Parallel Algorithm

Dense matrix operations on a torus and a boolean cube

عنوان ژورنال:

اشتراک گذاری